Brief Announcement: A Concurrent Partial Snapshot Algorithm for Large-Scale and Dynamic Distributed Systems

نویسندگان

Yonghwan Kim

Tadashi Araragi

Junya Nakamura

Toshimitsu Masuzawa

چکیده

Checkpoint-rollback recovery, which is a universal method for restoring distributed systems after faults, requires a sophisticated snapshot algorithm especially if the systems are large-scale, since repeatedly taking global snapshots of the whole system requires unacceptable communication cost. As a sophisticated snapshot algorithm, a partial snapshot algorithm has been introduced that takes a snapshot of a subsystem consisting only of the nodes that are communication-related to the initiator instead of a global snapshot of the whole system. In this paper, we modify the previous partial snapshot algorithm to create a new one that can take a partial snapshot more efficiently, especially when multiple nodes concurrently initiate the algorithm. Experiments show that the proposed algorithm greatly reduces the amount of communication needed for taking partial snapshots. key words: fault-tolerance, large-scale distributed system, concurrent snapshot, checkpoint, rollback

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust inter and intra-cell layouts design model dealing with stochastic dynamic problems

In this paper, a novel quadratic assignment-based mathematical model is developed for concurrent design of robust inter and intra-cell layouts in dynamic stochastic environments of manufacturing systems. In the proposed model, in addition to considering time value of money, the product demands are presumed to be dependent normally distributed random variables with known expectation, variance, a...

متن کامل

Serializable Snapshot Isolation in Shared-Nothing, Distributed Database Management Systems

NoSQL data storage systems provide high scalability and availability in exchange for limited transactional guarantees. In many cases, however, an application cannot give up transactional support but still needs the scalability provided by such systems. One approach for overcoming this limitation is to implement Snapshot Isolation (SI) on top of these systems. SI prevents most non-serializable e...

متن کامل

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

A Genetic Based Resource Management Algorithm Considering Energy Efficiency in Cloud Computing Systems

Cloud computing is a result of the continuing progress made in the areas of hardware, technologies related to the Internet, distributed computing and automated management. The Increasing demand has led to an increase in services resulting in the establishment of large-scale computing and data centers, in addition to high operating costs and huge amounts of electrical power consumption. Insuffic...

متن کامل

Access control in ultra-large-scale systems using a data-centric middleware

The primary characteristic of an Ultra-Large-Scale (ULS) system is ultra-large size on any related dimension. A ULS system is generally considered as a system-of-systems with heterogeneous nodes and autonomous domains. As the size of a system-of-systems grows, and interoperability demand between sub-systems is increased, achieving more scalable and dynamic access control system becomes an im...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

IEICE Transactions

دوره 97-D شماره

صفحات -

تاریخ انتشار 2011

Brief Announcement: A Concurrent Partial Snapshot Algorithm for Large-Scale and Dynamic Distributed Systems

نویسندگان

چکیده

منابع مشابه

Robust inter and intra-cell layouts design model dealing with stochastic dynamic problems

Serializable Snapshot Isolation in Shared-Nothing, Distributed Database Management Systems

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

A Genetic Based Resource Management Algorithm Considering Energy Efficiency in Cloud Computing Systems

Access control in ultra-large-scale systems using a data-centric middleware

عنوان ژورنال:

اشتراک گذاری